{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Build a Qusetion Answering App with Denser Retriever"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/denser-org/denser-retriever/blob/develop/tutorials/question_answering/question_answering.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook illustrates how to build a question answering app from scratch using [DenserRetriever](https://github.com/denser-org/denser-retriever). DenserRetriever is an enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Dependencies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need to install denser-retriever python library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install git+https://github.com/denser-org/denser-retriever.git#main\n",
    "!pip install grpcio==1.60.1\n",
    "!pip install grpcio-tools==1.60.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare the Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a subset of the  [InsuranceQA Corpus](https://github.com/shuzi/insuranceQA)  (1000 pairs of questions and answers) used in this demo, everyone can download on [Github](https://github.com/towhee-io/examples/releases/download/data/question_answer.csv)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! curl -L https://github.com/towhee-io/examples/releases/download/data/question_answer.csv -O"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**question_answer.csv**: a file containing question and the answer.\n",
    "\n",
    "Let's take a quick look:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"df\",\n  \"rows\": 1000,\n  \"fields\": [\n    {\n      \"column\": \"id\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 288,\n        \"min\": 0,\n        \"max\": 999,\n        \"num_unique_values\": 1000,\n        \"samples\": [\n          521,\n          737,\n          740\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"question\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1000,\n        \"samples\": [\n          \"Can  You  Cash  In  A  Universal  Life  Insurance  Policy?\",\n          \"How  Much  Does  Basic  Renters  Insurance  Cost?\",\n          \"Where  To  Get  Cheap  Renters  Insurance?\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"answer\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1000,\n        \"samples\": [\n          \"Yes, you can cash in a universal life policy. Discuss the ramifications of your decision with your agent, first. If you do cash it in, you can roll the cash value over into a new permanent policy without paying taxes on the cash. To cash it in, you call the company through which it was issued.\",\n          \"It will depend on how the divorce is structured. As an Insurance Agent and not an attorney I won't attempt to tell you what the laws says regarding divorce. What I will offer is some suggestions as to what could happen; 1) Nothing - present insurance in force prior to a divorce does not change because of the divorce 2) Judge assigns insurance policy - a judge could order that the beneficiary for the policy be changed to support a divorcing spouse or children. He could also issue instructions as to how the cash value of a policy is to be distributed. I would consult an attorney for a more detailed answer.\",\n          \"Is Northwestern Mutual whole Life a good investment? Well it depends on your needs. Your needs are what make any choice a good or bad investment. First Northwestern Mutual is a very highly rated fiscally strong company, as many others are. Using Whole life as an investment does work as part of an overall investment strategy as it provides a safe place to accumulate money as well as protecting your other plans. What you will find is whole life insurance is the most boring investment product out there. It's not exciting, it's not racy, it's not an up and down product. It is a slow safe and secure place to put money out there. It's boring because it does what it says it will do. I have owned whole life for almost 30 years and it has never let me down. Can't say the same about my real estate or stock market investments during that same window of time. Oh, last thing to keep it short. Whole life cash values don't go backwards. You can't lose it once it's there.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe",
       "variable_name": "df"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-84c66fed-df8f-48ac-b6c2-80f827617d34\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>question</th>\n",
       "      <th>answer</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Is  Disability  Insurance  Required  By  Law?</td>\n",
       "      <td>Not generally. There are five states that requ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>Can  Creditors  Take  Life  Insurance  After  ...</td>\n",
       "      <td>If the person who passed away was the one with...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Does  Travelers  Insurance  Have  Renters  Ins...</td>\n",
       "      <td>One of the insurance carriers I represent is T...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>Can  I  Drive  A  New  Car  Home  Without  Ins...</td>\n",
       "      <td>Most auto dealers will not let you drive the c...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>Is  The  Cash  Surrender  Value  Of  Life  Ins...</td>\n",
       "      <td>Cash surrender value comes only with Whole Lif...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-84c66fed-df8f-48ac-b6c2-80f827617d34')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-84c66fed-df8f-48ac-b6c2-80f827617d34 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-84c66fed-df8f-48ac-b6c2-80f827617d34');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-4401dfd5-8fc4-47b3-82e8-9fb9921cc1a1\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-4401dfd5-8fc4-47b3-82e8-9fb9921cc1a1')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-4401dfd5-8fc4-47b3-82e8-9fb9921cc1a1 button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "   id                                           question  \\\n",
       "0   0      Is  Disability  Insurance  Required  By  Law?   \n",
       "1   1  Can  Creditors  Take  Life  Insurance  After  ...   \n",
       "2   2  Does  Travelers  Insurance  Have  Renters  Ins...   \n",
       "3   3  Can  I  Drive  A  New  Car  Home  Without  Ins...   \n",
       "4   4  Is  The  Cash  Surrender  Value  Of  Life  Ins...   \n",
       "\n",
       "                                              answer  \n",
       "0  Not generally. There are five states that requ...  \n",
       "1  If the person who passed away was the one with...  \n",
       "2  One of the insurance carriers I represent is T...  \n",
       "3  Most auto dealers will not let you drive the c...  \n",
       "4  Cash surrender value comes only with Whole Lif...  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('question_answer.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download xgboost model file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100  410k  100  410k    0     0   902k      0 --:--:-- --:--:-- --:--:--  900k\n"
     ]
    }
   ],
   "source": [
    "!curl -L https://raw.githubusercontent.com/denser-org/denser-retriever/main/experiments/models/msmarco_xgb_es%2Bvs%2Brr_n.json -O"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download config file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -L https://raw.githubusercontent.com/denser-org/denser-retriever/develop/tutorials/question_answering/config.yaml -O"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Start ElasticSearch instance"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Elasticsearch and Milvus are required to run the Denser Retriever. They support the keyword search and vector search respectively."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ElasticSearch\n",
    "\n",
    "Download elasticsearch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "rm -rf elasticsearch-8.14.*\n",
    "wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.14.3-linux-x86_64.tar.gz\n",
    "wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.14.3-linux-x86_64.tar.gz.sha512\n",
    "tar -xzf elasticsearch-8.14.3-linux-x86_64.tar.gz\n",
    "sudo chown -R daemon:daemon elasticsearch-8.14.3/\n",
    "shasum -a 512 -c elasticsearch-8.14.3-linux-x86_64.tar.gz.sha512\n",
    "umount /sys/fs/cgroup\n",
    "apt install cgroup-tools"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Disable security settings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!echo \"xpack.security.enabled: false\" >> elasticsearch-8.14.3/config/elasticsearch.yml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start elasticsearch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash --bg\n",
    "\n",
    "sudo -H -u daemon ./elasticsearch-8.14.3/bin/elasticsearch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Once* the instance has been started, grep for `elasticsearch` in the processes list to confirm the availability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This part is important, since it takes a little amount of time for instance to load\n",
    "import time\n",
    "time.sleep(20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "root       11590   11588  0 12:41 ?        00:00:00 sudo -H -u daemon ./elasticsearch-8.14.3/bin/ela\n",
      "daemon     11591   11590 10 12:41 ?        00:00:05 /content/elasticsearch-8.14.3/jdk/bin/java -Xms4\n",
      "daemon     11661   11591 99 12:41 ?        00:00:47 /content/elasticsearch-8.14.3/jdk/bin/java -Des.\n",
      "daemon     11697   11661  0 12:41 ?        00:00:00 /content/elasticsearch-8.14.3/modules/x-pack-ml/\n",
      "root       11929   11927  0 12:42 ?        00:00:00 grep elastic\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "\n",
    "ps -ef | grep elastic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build a Denser Retriever"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we will show how to build our question answering engine using DenserRetriever. The basic idea behind question answering is to use Langchain text splitting to generate embedding from the question dataset and compare the input question with the embedding stored in Milvus."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate passages"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We first generate passages from the question dataset. We use the `DenserRetriever` to generate the embeddings of the questions and store them in Milvus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import CSVLoader\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "from denser_retriever.utils import save_HF_docs_as_denser_passages\n",
    "\n",
    "# Generate text chunks\n",
    "documents = CSVLoader(\"question_answer.csv\").load()\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
    "texts = text_splitter.split_documents(documents)\n",
    "passage_file = \"passages.jsonl\"\n",
    "save_HF_docs_as_denser_passages(texts, passage_file, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ingest by DenserRetriever"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can build a DenserRetriever with the generated `passages.jsonl`.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2024-07-11 12:43:10 INFO: ES analysis default\n",
      "2024-07-11 12:43:13 INFO: ES ingesting passages.jsonl record 2087\n",
      "2024-07-11 12:43:14 INFO: Done building ES index\n"
     ]
    }
   ],
   "source": [
    "from denser_retriever.retriever_general import RetrieverGeneral\n",
    "\n",
    "\n",
    "retriever_denser = RetrieverGeneral(\"question_answer\", \"config.yaml\")\n",
    "retriever_denser.ingest(\"passages.jsonl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Query test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that embedding for question dataset have been ingested by DenserRetriever, we can ask question with retrieve function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2024-07-11 13:04:56 INFO: ElasticSearch passages: 100 time: 0.082 sec. \n",
      "2024-07-11 13:04:56 INFO: Rerank time: 0.217 sec.\n",
      "{\n",
      "    \"source\": \"question_answer.csv\",\n",
      "    \"text\": \"id: 0\\nquestion: Is  Disability  Insurance  Required  By  Law?\\nanswer: Not generally. There are five states that require most all employers carry short term disability insurance on their employees. These states are: California, Hawaii, New Jersey, New York, and Rhode Island. Besides this mandatory short term disability law, there is no other legislative imperative for someone to purchase or be covered by disability insurance.\",\n",
      "    \"title\": \"\",\n",
      "    \"pid\": 0,\n",
      "    \"score\": 13.668355670776368\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "from denser_retriever.retriever_general import RetrieverGeneral\n",
    "\n",
    "retriever_denser = RetrieverGeneral(\"question_answer\", \"config.yaml\")\n",
    "\n",
    "query = \"Is Disability Insurance Required By Law?\"\n",
    "passages, docs = retriever_denser.retrieve(query, {})\n",
    "print(json.dumps(passages[0], indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Integrate in chat\n",
    "\n",
    "Now we can add our retriever into a chatbot. We will use `towhee` and `gradio` to launch a simple chatbot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install towhee gradio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n",
      "\n",
      "Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n",
      "Running on public URL: https://91e8c262c01b49c6b1.gradio.live\n",
      "\n",
      "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"https://91e8c262c01b49c6b1.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import gradio as gr\n",
    "from towhee import pipe\n",
    "\n",
    "def chat(message, history):\n",
    "    history = history or []\n",
    "    ans_pipe = (\n",
    "        pipe.input('question')\n",
    "            .map('question', 'res', lambda x: retriever_denser.retrieve(x, {}))\n",
    "            .map('res', 'answer', lambda x: x[0])\n",
    "            .output('question', 'answer')\n",
    "    )\n",
    "\n",
    "    response = ans_pipe(message).get()[1][0]['text'].split('answer:')[1]\n",
    "    yield response\n",
    "\n",
    "# chat(\"Is Disability Insurance Required By Law?\", [])\n",
    "\n",
    "chatbot = gr.ChatInterface(chat).queue()\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    chatbot.launch()\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
